CpS 450 Language Translation Systems

Code Generation

Source to Source Translation

  • In the code generation phase, the compiler performs a traversal of the parse tree and emits target code
  • See examples/source2source

Intro to Linux Assembler Programming

See examples/codegen/hello for a simple example of a Linux assembler program that displays some output.

One way to learn is by examining the assembly output of the gcc compiler. Create a small C test program, and invoke gcc with the -S switch:

gcc -S myprog.c

This produces myprog.s containing the generated assembly code.

Some notes about writing Linux assembly code:

  • Precede your data declarations with the .data directive
  • Precede executable instructions with the .text directive
  • Comment with #
  • Define numeric constants using
    constant_name = value
    

    Ex:

    STDOUT = 0
    
  • Labels start at the beginning of the line and have a : at the end. ex:
    hello:
    

AT&T Syntax

The syntax used by Unix assemblers, “AT&T syntax”, differs from the “Intel Syntax” used by Windows assemblers, in several ways:

  • The source operand comes first, followed by the destination operand
  • Register names are prefixed with %
  • Instruction names have suffixes to indicate the operand size: ‘l’ for long (32 bits), ‘w’ for word (16 bits), ‘b’ for byte (8 bits). To keep matters simple, your code will deal strictly with 32 bit operands.
  • Pointer dereference uses ( ) instead of [ ]

Compare Intel syntax to the AT&T syntax:

AT&T Syntax Intel Syntax
movl $1, %eax mov eax, 1
movl (%ebx), %eax mov eax, [ebx]

Here’s a Hello World assembler program:

# hello.s -- Hello World in Linux Assembler

STDOUT = 1     # define a constant

.data
hello:
        .string "hello world\n"

.text
.global main
main:
        # Call write(STDOUT, "hello world\n", 12)
        pushl $12
        pushl $hello
        pushl $STDOUT
        call    write
        addl   $12, %esp

        # Call exit(0)
        pushl $0
        call    exit  # no return...

	    # or ...
        #movl  $1, %eax    # 1 is the number of the exit system call
        #movl  $0, %ebx    # 0 is the parameter for exit
        #int     $0x80        

To assemble this program, simply use gcc:

gcc hello.s -ohello

gcc invokes the assembler and linker to produce the resulting executable.

Linux I/O

Our programs must have a way to do I/O. Since our code will run on the Linux platform, it will perform Linux system calls to do the I/O.

The Linux I/O system calls are fairly simple – read() gets input from an open file, and write() produces output to a file. You supply a file descriptor to specify which file to write to / read from.

In Linux, use the man command to get information on these functions:

man 2 read
man 2 write

You’ll be reading from stdin (file descriptor 0), and writing to stdout (file descriptor 1).